Improving reinforcement learning algorithms: Towards optimal learning rate policies

نویسندگان

چکیده

This paper shows how to use results of statistical learning theory and stochastic algorithms have a better understanding the convergence Reinforcement Learning (RL) once it is formulated as fixed point problem. can be used propose improvement RL rates. First, our analysis that classical asymptotic rate O ( 1 / N ) $O(1/\sqrt {N})$ pessimistic replaced by log β $O((\log (N)/N)^{\beta })$ with 2 ≤ $\frac{1}{2}\le \beta \le 1$ , number iterations. Second, we dynamic optimal policy for choice in RL. We decompose into two interacting levels: inner outer levels. In level, present PASS algorithm (for “PAst Sign Search”) which, based on predefined sequence rates, constructs new which error decreases faster. The proved bounds are established. an methodology selection sequence. Third, show empirically outperforms significantly standard three following applications: estimation drift, placement limit orders, execution large shares.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Optimal policy switching algorithms for reinforcement learning

We address the problem of single-agent, autonomous sequential decision making. We assume that some controllers or behavior policies are given as prior knowledge, and the task of the agent is to learn how to switch between these policies. We formulate the problem using the framework of reinforcement learning and options (Sutton, Precup & Singh, 1999; Precup, 2000). We derive gradient-based algor...

متن کامل

Analysis of a Method Improving Reinforcement Learning Agents' Policies

Reinforcement learning (RL) is a kind of machine learning. It aims to optimize agents’ policies by adapting the agents to an environment according to rewards. In this paper, we propose a method for improving policies by using stochastic knowledge, in which reinforcement learning agents obtain. We use a Bayesian Network (BN), which is a stochastic model, as knowledge of an agent. Its structure i...

متن کامل

Hierarchical Reinforcement Learning: Approximating Optimal Discounted TSP Using Local Policies

In this work, we provide theoretical guarantees for reward decomposition in deterministic MDPs. Reward decomposition is a special case of Hierarchical Reinforcement Learning, that allows one to learn many policies in parallel and combine them into a composite solution. Our approach builds on mapping this problem into a Reward Discounted Traveling Salesman Problem, and then deriving approximate ...

متن کامل

Imitative Policies for Reinforcement Learning

We discuss a reinforcement learning framework where learners observe experts interacting with the environment. Our approach is to construct from these observations exploratory policies which favor selection of actions the expert has taken. This imitation strategy can be applied at any stage of learning, and requires neither that information regarding reinforcement be conveyed from the expert to...

متن کامل

Algorithms for Reinforcement Learning

Reinforcement learning is a learning paradigm concerned with learning to control a system so as to maximize a numerical performance measure that expresses a long-term objective. What distinguishes reinforcement learning from supervised learning is that only partial feedback is given to the learner about the learner’s predictions. Further, the predictions may have long term effects through influ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Mathematical Finance

سال: 2023

ISSN: ['0960-1627', '1467-9965']

DOI: https://doi.org/10.1111/mafi.12378